-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_route53_record: add create,update,delete timeout options for the resource #11895
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @abhinavdahiya 👋 Thank you for submitting this.
We have found in practice that generally when folks are asking for customizable timeouts, that usually there is actually another bug or change needed in the resource logic. This is usually due to how Terraform Plugin SDK helpers expect certain operations to complete (usually immediately) and how the AWS Go SDK causes operations to complete (potentially automatically retrying for extended periods of time without returning during throttling), but also sometimes due to the Terraform AWS Provider needing to tell the AWS Go SDK to automatically retry certain API responses that are not already retried automatically.
We went through a large effort to fix a class of these in #7873. This covered resource.Retry()
cases, but did not cover (resource.StateChangeConf).WaitForState()
since we had received less of these reports in practice and the fix implementation is markedly different. The fix in those cases was to retry one last time without time bounding so operators do not need to manually figure the appropriate timeframe out in their environment and can instead rely on the AWS Go SDK max retries (still configurable, but by default retries for roughly an hour on throttling).
In this specific case though, we instead can add the PriorRequestNotComplete
error code as a retryable error code for the ChangeResourceRecordSets
operation in aws/config.go
. An example of how this was done for a similar issue with a different resource can be found here:
Something like the below should get us close:
client.r53conn.Handlers.Retry.PushBack(func(r *request.Request) {
if r.Operation.Name == "ChangeResourceRecordSets" {
if isAWSErr(r.Error, route53.ErrCodePriorRequestNotComplete, "") {
r.Retryable = aws.Bool(true)
}
}
})
Then we can remove the changeRoute53RecordSet
usage of StateChangeConf
and associated 5 minute timeout completely. If you can change this implementation to the above, we can get this in to hopefully solve your (and others) problem without requiring operator intervention. Thanks!
66aa881
to
82afc9b
Compare
Hi @bflad ! thanks for the detailed response and the alternative implementation suggestion.. When i started looking into the adding So the logic from #7873 seemed like a better way to solve the timeout issue. Therefore i updated the functions to use that workflow, allows aws eventual consistency to kick in and then depend on SDK's internal backoff for PriorRequestNotComplete. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for those updates, @abhinavdahiya 😄 Two more minor things and this should be good to go. 👍
aws/resource_aws_route53_record.go
Outdated
var out *route53.ChangeResourceRecordSetsOutput | ||
err := resource.Retry(1*time.Minute, func() *resource.RetryError { | ||
var err error | ||
out, err = conn.ChangeResourceRecordSets(input) | ||
if isAWSErr(err, "InvalidChangeBatch", "") { | ||
// This means the record is already gone | ||
return nil | ||
} | ||
return resource.NonRetryableError(err) | ||
}) | ||
if isResourceTimeoutError(err) { | ||
out, err = conn.ChangeResourceRecordSets(input) | ||
if isAWSErr(err, "InvalidChangeBatch", "") { | ||
// This means the record is already gone | ||
return out, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified without resource.Retry()
at all since there are no retryable conditions 👍
out, err := conn.ChangeResourceRecordSets(input)
if isAWSErr(err, "InvalidChangeBatch", "") {
return out, nil
}
return out, err
See also: #11864
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws/resource_aws_route53_record.go
Outdated
log.Print("[DEBUG] Hosted Zone not found, retrying...") | ||
return resource.RetryableError(err) | ||
} | ||
return resource.NonRetryableError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: In the future we will likely require the usage of resource.NonRetryableError()
/resource.RetryableError()
with non-nil errors:
return resource.NonRetryableError(err) | |
if err != nil { | |
return resource.NonRetryableError(err) | |
} | |
return nil |
See also: hashicorp/terraform-plugin-sdk#199
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
fixed with f5cf401
…DK backoff `PriorRequestNotComplete` is already part of the list of throttle codes [1], so if we use the SDK inbuilt backoff we shouldn't need custom state change logic that retries for 5 minutes. So this changes the changeRoute53RecordSet to resource.Retry and last retryOnTimeout from issue 7873 [2], such that we send request in resource.Retry for 1 minute and if there is a timeout, we send another request that then uses the AWS SDK default retry logic... while the deleteRoute53RecordSet now directly makes the request using SDK without any external retries. [1]: https://github.com/aws/aws-sdk-go/blob/3acad1210f5c4feb4866ff9874b5f3f5895bd96d/aws/request/retryer.go#L93 [2]: hashicorp#7873
82afc9b
to
f5cf401
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @abhinavdahiya 🚀
Output from acceptance testing:
--- PASS: TestAccAWSRoute53Record_disappears (138.04s)
--- PASS: TestAccAWSRoute53Record_Alias_S3 (140.04s)
--- PASS: TestAccAWSRoute53Record_txtSupport (144.45s)
--- PASS: TestAccAWSRoute53Record_caaSupport (157.92s)
--- PASS: TestAccAWSRoute53Record_generatesSuffix (160.83s)
--- PASS: TestAccAWSRoute53Record_Alias_Elb (166.20s)
--- PASS: TestAccAWSRoute53Record_underscored (169.23s)
--- PASS: TestAccAWSRoute53Record_Alias_Uppercase (178.33s)
--- PASS: TestAccAWSRoute53Record_basic_fqdn (184.78s)
--- PASS: TestAccAWSRoute53Record_geolocation_basic (184.91s)
--- PASS: TestAccAWSRoute53Record_failover (192.05s)
--- PASS: TestAccAWSRoute53Record_weighted_basic (197.08s)
--- PASS: TestAccAWSRoute53Record_latency_basic (197.18s)
--- PASS: TestAccAWSRoute53Record_weighted_to_simple_basic (200.10s)
--- PASS: TestAccAWSRoute53Record_spfSupport (202.82s)
--- PASS: TestAccAWSRoute53Record_basic (212.78s)
--- PASS: TestAccAWSRoute53Record_wildcard (223.55s)
--- PASS: TestAccAWSRoute53Record_disappears_MultipleRecords (271.22s)
--- PASS: TestAccAWSRoute53Record_empty (119.67s)
--- PASS: TestAccAWSRoute53Record_longTXTrecord (122.34s)
--- PASS: TestAccAWSRoute53Record_multivalue_answer_basic (126.24s)
--- PASS: TestAccAWSRoute53Record_AliasChange (159.71s)
--- PASS: TestAccAWSRoute53Record_TypeChange (167.90s)
--- PASS: TestAccAWSRoute53Record_SetIdentifierChange (168.08s)
--- PASS: TestAccAWSRoute53Record_allowOverwrite (159.12s)
--- PASS: TestAccAWSRoute53Record_weighted_alias (351.02s)
--- PASS: TestAccAWSRoute53Record_Alias_VpcEndpoint (485.16s)
This has been released in version 2.49.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks! |
brings in fix for aws_record timeout [1] and other fixes mentioned in CHANGELOG [2] updated using ``` go get github.com/terraform-providers/terraform-provider-aws@f0f304894df67616dfbd675bc9687a7db266ad41 ``` Using the tag failed with error ``` go get github.com/terraform-providers/terraform-provider-aws@v2.49.0 go: finding github.com/terraform-providers/terraform-provider-aws v2.49.0 go: finding github.com/terraform-providers/terraform-provider-aws v2.49.0 go get github.com/terraform-providers/terraform-provider-aws@v2.49.0: github.com/terraform-providers/terraform-provider-aws@v2.49.0: invalid version: module contains a go.mod file, so major version must be compatible: should be v0 or v1, not v2 ``` Also removes the indirect dependecy on `aws-iam-authenticator` because of [3] [1]: hashicorp/terraform-provider-aws#11895 [2]: https://github.com/terraform-providers/terraform-provider-aws/blob/v2.49.0/CHANGELOG.md#2490-february-14-2020 [3]: hashicorp/terraform-provider-aws#11822
brings in fix for aws_record timeout [1] and other fixes mentioned in CHANGELOG [2] updated using ``` go get github.com/terraform-providers/terraform-provider-aws@f0f304894df67616dfbd675bc9687a7db266ad41 ``` Using the tag failed with error ``` go get github.com/terraform-providers/terraform-provider-aws@v2.49.0 go: finding github.com/terraform-providers/terraform-provider-aws v2.49.0 go: finding github.com/terraform-providers/terraform-provider-aws v2.49.0 go get github.com/terraform-providers/terraform-provider-aws@v2.49.0: github.com/terraform-providers/terraform-provider-aws@v2.49.0: invalid version: module contains a go.mod file, so major version must be compatible: should be v0 or v1, not v2 ``` Also removes the indirect dependecy on `aws-iam-authenticator` because of [3] [1]: hashicorp/terraform-provider-aws#11895 [2]: https://github.com/terraform-providers/terraform-provider-aws/blob/v2.49.0/CHANGELOG.md#2490-february-14-2020 [3]: hashicorp/terraform-provider-aws#11822
brings in fix for aws_record timeout [1] and other fixes mentioned in CHANGELOG [2] updated using ``` go get github.com/terraform-providers/terraform-provider-aws@f0f304894df67616dfbd675bc9687a7db266ad41 ``` Using the tag failed with error ``` go get github.com/terraform-providers/terraform-provider-aws@v2.49.0 go: finding github.com/terraform-providers/terraform-provider-aws v2.49.0 go: finding github.com/terraform-providers/terraform-provider-aws v2.49.0 go get github.com/terraform-providers/terraform-provider-aws@v2.49.0: github.com/terraform-providers/terraform-provider-aws@v2.49.0: invalid version: module contains a go.mod file, so major version must be compatible: should be v0 or v1, not v2 ``` Also removes the indirect dependecy on `aws-iam-authenticator` because of [3] [1]: hashicorp/terraform-provider-aws#11895 [2]: https://github.com/terraform-providers/terraform-provider-aws/blob/v2.49.0/CHANGELOG.md#2490-february-14-2020 [3]: hashicorp/terraform-provider-aws#11822
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
Community Note
ChangeRecordSets now uses the AWS SDKs inbuilt retry method instead of StateConf change retry loop for 5 minutes.
Release note for CHANGELOG:
Output from acceptance testing:
Closes #11896